AITopics | preconditioned stochastic gradient langevin dynamic

Stochastic Gradient Langevin Dynamics infuses isotropic gradient noise to SGD to help navigate pathological curvature in the loss landscape for deep networks. Isotropic nature of the noise leads to poor scaling, and adaptive methods based on higher order curvature information such as Fisher Scoring have been proposed to precondition the noise in order to achieve better convergence. In this paper, we describe an adaptive method to estimate the parameters of the noise and conduct experiments on well-known model architectures to show that the adaptively preconditioned SGLD method achieves convergence with the speed of adaptive first order methods such as Adam, AdaGrad etc. and achieves generalization equivalent of SGD in the test set.

artificial intelligence, machine learning, noise, (14 more...)

arXiv.org Machine Learning

1906.04324

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)

Add feedback

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

Li, Chunyuan (Duke University) | Chen, Changyou (Duke University) | Carlson, David (Columbia University) | Carin, Lawrence (Duke University)

AAAI ConferencesApr-19-2016

Effective training of deep neural networks suffers from two main issues. The first is that the parameter space of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is typically addressed by early stopping. However, recent work has demonstrated that Bayesian model averaging mitigates this problem. The posterior can be sampled by using Stochastic Gradient Langevin Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD methods inefficient. Here, we propose combining adaptive preconditioners with SGLD. In support of this idea, we give theoretical properties on asymptotic convergence and predictive risk. We also provide empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets, demonstrating that our preconditioned SGLD method gives state-of-the-art performance on these models.

deep neural network, preconditioned stochastic gradient langevin dynamic, stochastic gradient langevin dynamic, (2 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

Li, Chunyuan (Duke University) | Chen, Changyou (Duke University) | Carlson, David (Columbia University) | Carin, Lawrence (Duke University)

AAAI ConferencesApr-19-2016

Effective training of deep neural networks suffers from two main issues. The first is that the parameter space of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is typically addressed by early stopping. However, recent work has demonstrated that Bayesian model averaging mitigates this problem. The posterior can be sampled by using Stochastic Gradient Langevin Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD methods inefficient. Here, we propose combining adaptive preconditioners with SGLD. In support of this idea, we give theoretical properties on asymptotic convergence and predictive risk. We also provide empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets, demonstrating that our preconditioned SGLD method gives state-of-the-art performance on these models.

artificial intelligence, bayesian inference, machine learning, (15 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: North America (0.28)

Genre:

Research Report (0.54)
Instructional Material (0.46)

Add feedback

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

Li, Chunyuan, Chen, Changyou, Carlson, David, Carin, Lawrence

arXiv.org Machine LearningDec-23-2015

Effective training of deep neural networks suffers from two main issues. The first is that the parameter spaces of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is typically addressed by early stopping. However, recent work has demonstrated that Bayesian model averaging mitigates this problem. The posterior can be sampled by using Stochastic Gradient Langevin Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD methods inefficient. Here, we propose combining adaptive preconditioners with SGLD. In support of this idea, we give theoretical properties on asymptotic convergence and predictive risk. We also provide empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets, demonstrating that our preconditioned SGLD method gives state-of-the-art performance on these models.

algorithm, gradient, psgld, (13 more...)

arXiv.org Machine Learning

1512.07666

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Add feedback

Filters

Collaborating Authors

preconditioned stochastic gradient langevin dynamic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Adaptively Preconditioned Stochastic Gradient Langevin Dynamics

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks